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ABSTRACT 



In order to detect a face disposed within a digital image, the 
pixels of the image are grouped based on whether they are 
skin color. The edges of the skin colored areas are removed 
by eliminating pixels that have surrounding pixels with a 
high variance in the luminance component. The resulting 
connected components are classified to determine whether 
they could include a face. The classification includes exam- 
ining: the area of the bounding box of the component, the 
aspect ratio, the ratio of detected skin to the area of the 
bounding box, the orientation of elongated objects, and the 
distance between the center of the bounding box and the 
center of mass of the component. Components which are 
still considered facial candidates are mapped on to a graph. 
The minimum spanning trees of the graphs are extracted and 
the corresponding components which remain are again clas- 
sified for whether they could include a face. Each graph is 
split into two by removing the weakest edge and the corre- 
sponding components which remain are yet again classified. 
The graph is continually broken down until a bounding box 
formed around the resulting graphs is smaller than a thresh- 
old. Finally, a heuristic is performed to eliminate false 
positives. The heuristic compares the ratio of pixels with 
high variance to the total number of pixels in a face 
candidate component. 

5 Claims, 9 Drawing Sheets 
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METHOD FOR DETECTING A FACE IN A Another aspect of the invention is a computer readable 

DIGITAL IMAGE storage medium containing data for performing the steps of 

M^runuwn rsv run tmucmtiam providing a digital image composed of a plurality of pixels 

BACKGROUND OF THE INVENTION and producing a binarv ^ from the digitaJ image by 

This invention relates to the field of image detection and 5 detecting skin colored pixels. The computer readable storage 

more specifically to the detection of a face disposed within medium further contains data for removing pixels corre- 

a digital image. sponding to edges in the luminance component thereby 

As broadcasting becomes more digitally based, it producing binary image components; mapping the binary 
becomes easier to archive and catalog video content. components into at least one graph; and classifying 
Researchers have developed systems for content-based 30 the ma PP ed bmar y ima S e components as facial and non- 
image and video indexing and retrieval that utilize low-level facial l yP es wnerein the facial types serve as facial candi- 
visual features (semantics) like color, texture, shape and dates. 

sketch of an image. To facilitate the automatic archiving and Yet another aspect of the invention is a method for 
retrieval of video material based on higher level semantics, detecting a face disposed within a digital image. The method 
it is important to detect and recognize events in video clips. 15 comprises providing a digital image composed of a plurality 
Human activities are important events in video clips and °f pixels and producing a binary image from the digital 
face detection is a step toward the recognization of human image by detecting skin colored pixels. The method further 
activities. includes removing pixels corresponding to edges in the 
Face detection is also useful in security systems, criminal luminance component thereby producing binary image corn- 
identifications, digital image capturing, and teleconferences. 20 P onen ts; and classifying each of the binary image compo- 
In a security system, for example, it is useful to detect the nents ^ one of a facial t > r P e and a non-facial type. The 
facial portions of an image being viewed so that an operator classifying includes forming a bounding box around a 
of the system can discern whether a human is present in the classified component of the components and performing at 
image, least one of: comparing an area of the bounding box to a 
The detection of faces from images has not received much 25 funding box threshold; comparing an aspect ratio of the 
attention by researchers. Most conventional techniques con- Ending box to an aspect ratio threshold; determining an 
centrate on face recognition and assume that a face has been area r f?' f ? U ° bemg the com P arisoD between the 
identified within the image or assume that the image only area ° f the cl f sified component and the area of the bound- 
has one face as in a "mug shot" image. Such conventional , n ^ *>x and comparing the area ratio to an area ratio 
techniques are unable to detect faces from complex back- thre u shold ; determining an orientation of elongated objects 
grounds. within the bounding box; and determining a distance 
«™ n * * u • *u * a .c c i . between a center of the bounding box and a center of the 
One prior art technique that does perform face detection classified component. 



determines whether a cluster of pixels conforms to a facial 



template. This routine is deficient because of different scales 35 . ' * 30 °^ jeCt ° f the " ,ven u hon to provide a method for 

and orientations of possible faces. The template itself is one d6t ^ Ung u a faoe d,5 P° sed wtlml a dl 8 ltal 

size and orientation and will not detect faces which are of a ^ ob J ect > 35 weU as others - wlU become m °re apparent 

different size or are rotated. Consequently, the template itself from the followin g description read in conjunction with the 

must be scaled up and down and rotated while searching is accompanying drawings where like reference numerals are 

performed yielding a search space that is too big to be useful 40 mtended t0 designate the same elements, 

or practical. Some prior art techniques, like EPA 0836326 BRIEF DESCRIPTION OF THE DRAWINGS 

A2 use merely a shape template to see if a cluster of pixels FIG. 1 is a diagram of a cylindrical coordinate system 

conforms to that shape. In addition to the scaling and used for y ^ f ^ ^ ; 

rotation problems mentioned above, this solution is too - . ° , ., 6 , 

simplistic to be used with complex backgrounds which may 45 „ FIG , S 2A ' 2 \ ?• ™ ? v 20 , are , lhree . ^presenting 

havemanyobjectswiththesame shape asaface and perhaps P r °J ectl ° DS of the YUV color domain mdicating the areas 

even the same color as a face. ^ ^ 

Therefore, there exists a need for a method of detecting "? on ^. ima S es I respective binary 

faces within a digital image in which the face is disposed 3^£" y """^ ^ by gr ° Upmg P1X6lS 

within a complex background. so . . !. 

MG. 4 is a diagram illustrating how a 3x3 mask is used 

SUMMARY AND OBJECTS OF THE as part of luminance variation detection in accordance with 

INVENTION the invention; 

One aspect of the invention is a method for detecting a FIGS< 5A and 5B are dia gr ams illustrating 4 and 8 type 

face disposed within a digital image. The method comprises 55 connectivity, respectively; 

providing a digital image composed of a plurality of pixels FIGS. 6A and 6B are images showing what the image of 

and producing a binary image from the digital image by FIGS. 3C and 3E would look like after the edges are 

detecting skin colored pixels. The method further includes removed in accordance with the invention; 

removing pixels corresponding to edges in the luminance FIG. 7 is an image showing examples of bounding boxes 

component thereby producing binary image components; a PP^ ed to the image of FIG. 3F; 

mapping the binary image components into at least one FIG. 8 is a sequence of diagrams showing how compo- 

graph; and classifying the mapped binary image components nents of an image are represented by vertices and connected 

as facial and non-facial types wherein the facial types serve to form a graph in accordance with the invention; 

as facial candidates. FIGS. 9A-9D are a sequence of images illustrating the 

In this way, as well as with the following aspects of the 65 application of a heuristic according to the invention; 

invention, a face disposed within a digital image can be FIG. 10 is a flow chart detailing the general steps involved 

detected. in the invention; and 
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DETAILED DESCRIPTION OF THE pixel. Clearly, other masks besides a square box could be 

PREFERRED EMBODIMENTS used. The variance is defined as 

Each pixel in an image is generally represented in the lA 2 

HS V (hue, saturation, value) color domain. These values are 5 nfj (Jfi " ^ 
mapped onto a cylindrical coordinate system as shown in 
FIG. 1 where P is the value (or luminance), 0 is the hue, and 

r is the saturation. Due to the non-linearity of cylindrical whe ' e * e J vera 8 e of , dl , ,he ,f' xel f. ff to lhe J exu ™ ed 

j ■ . . ..i j ' window. A high variance level will be different depending 

coordinate systems, other color spaces are used to approxi- „ rt +u c * *u -i • u j ,i_ j- • i 

. _ _ r . r * 10 upon the face and the camera used in broadcasting the digital 

mate the HSV space. In the present applications, the YUV There£6re( M iterative routine ^ ^ starting with a 

color space is used because most video material stored on a very high variance level and working down to a low variance 

magnetic medium and the MPEG2 standard, both use this level. 

color space. At each step of the variance iteration, pixels are removed 

™ e . nr-n - * * *u vttw j • j 15 fr° m facial consideration if the variance in the window 

Transforming an RGB image into the YUV domain, and A i * i a • i • 4 lL iL 

_ 4 . ... . L ,, T1 , around the skin colored pixel is greater than the variance 

further projecting into the VU,VY, and VU planes, produces thres hold being tested for that iteration. After all of the 

graphs like those shown in FIGS. 2A, 2B, and 2C. The circle pixels m examined in an iteration, the resulting connected 

segments represent the approximation of the HSV domain. components are examined for facial characteristics as is 

When pixels corresponding to skin color are graphed in the 20 described more fully below. Connected components are 

YUV space, they generally fall into those circle segments pixels which are of the same binary value (white for facial 

shown. For example, when the luminance of a pixel has a color) and connected. Connectivity can be either 4 or 8 type 

value between 0 and 200, the chrominance U generally has connectivity. As shown in FIG. 5A, for 4 type connectivity, 

a value between -100 and 0 for a skin colored pixel. These the center P ixel is considered "connected" to only pixels 

are general values based on experimentation. Clearly, a color 25 ? d J acen ! l ° il * » indicated b ? th L e " r . m tne 

t . . , , . - , f , adjacent boxes. In 8 type connectivity, as is shown in FIG. 

training operation could be performed for each camera being -n i j- « * i_- ^ ' » , 

, ™ in. 5B > pixels diagonally touching the center pixel are also 
used to capture images that include faces. The results of that cons idered to be "connected" to that pixel, 
training would then be used to produce more precise skin M stated above> &hcr each iteratioDj the connected corn- 
colored segments. 30 p 0nen t s are examined in a component classification step to 

To detect a face, each pixel in an image is examined to ^ tf the y 001x1(1 be a face - Tnis examination involves 

discern whether it is skin colored. Those pixels which are looking at 5 distinct criteria based upon a bounding box 

skin colored are grouped from the rest of the image and are drawn 1 ar ° und . each resultill g connected component; 

thus retained as potential face candidates. If at least one 35 ^^^^^^^ * HG ' ? ^ ° D thC ^ 

projection of a pixel does not fall within the boundaries of ° ^ —* ' % ^ f n& ^ , . , , 

+u , - , , , • i • j j , 1. Ine area oi the boundmg box compared to a threshold, 

the skin cluster segment, the pixel is deemed not skin . . e ... . & - . n * n . 

• j j X c vj • i c This recogmzes the fact that a face will generally not be very 

colored and removed from consideration as a potential face ^ Qr ° small 
csndiflstc 

2. The aspect ratio (height compared to the width) of the 

The resultant image formed by the skin color detection is 40 bounding box compared to a threshold. This recognizes that 

binary because it shows either portions of the image which human faces generally fall into a range of aspect ratios. 

are skin color or portions which are not skin color as shown 3 * ^ ratio of the area of detected skin colored pixels to 

in FIGS. 3B, 3D, and 3F which correspond to original the area of the bounding box, compared to a threshold. This 

images in FIGS. 3A, 3C, and 3E respectively. In the figures, criteria rec °g nizes t hat fact that the area covered by a human 

white is shown for skin color and black for non-skin color. 45 J^jJ fa ^ into a range of P ercenta B es of the area of the 

As shown in FIGS. 3 A and 3B, this detecting step alone may A 0X \ c , 4 , , u ,. , , , 
mlftA f.u • u ■ c j- j 4. The orientation of elongated objects within the bound- 
rule out large portions of the image as having a face disposed u ™ , & r _, • - , 

.... - ( n . , , . - , . , . 6 - - ing box. There are many known ways of determining the 

within it Prior art techniques which use color and shape may orientation of a series of ^ Fof J ^ mcdk f ^ 

thus work for simple backgrounds like that shown in FIG. 50 can 5e determined and the orientation can be found from that 

3A. However, looking at FIGS. 3C and 3D and FIGS. 3E and axis . In general> faces are QOt f0tated significantly about the 

3F, it is clear that detection by color and shape alone may not axis ("z-axis") which is perpendicular to the plane having 

be sufficient to detect the faces. In FIGS. 3C-3F, objects in the image and so components with elongated objects that are 

the background like leather, wood, clothes, and hair, have rotated with respect to the z-axis are removed from consid- 

colors similar to skin. As can be seen in FIGS. 3D and 3F, 55 eration. 

these skin colored objects are disposed immediately adjacent 5. The distance between the center of the bounding box 

to the skin of the faces and so the faces themselves are and the center of mass of the component being examined, 

difficult to detect. Generally, faces are located within the center of the of the 

Afterthepixelsaregroupedbycolor,thepixelslocatedon 6Q b ° X and WiU n0t ' f ° f CXample ' be loCated allt °° ne 

edges are removed from consideration. An edge is a change The iterations fof varfance m C0QiinuG6 thereb breaki 

in the brightness level from one pixel to the next. The down the image into smaller components until the size of the 

removal is accomplished by taking each skin colored pixel components is below a threshold. The images of FIGS. 3C 

and calculating the variance in the pixels around it in the a nd 3E are shown transformed in FIGS. 6Aand 6B respec- 

luminance component; a high variance being indicative of 6 5 tively after the variance iteration process. As can be 

an edge. As is shown in FIG. 4, a box ("window"), the size discerned, faces in the image were separated from the 

of either 3x3 or 5x5 pixels is placed on top of a skin colored non-facial skin colored areas in the background as a result of 
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the variance iteration. Frequently, this causes the area with 
detected skin color to be fragmented as is exemplified in 
FIG. 6B. This occurs because either there are objects occlud- 
ing portions of the face (like eyeglasses or facial hair) or 
because portions were removed due to high variance. It 5 
would thus be difficult to look for a face using the resulting 
components by themselves. The components that still can be 
part of face after the variance iteration and component 
classification steps, are mapped on to a graph as shown in 
FIG. 8. In this way, skin colored components that have 10 
similar features, and are close in space, are grouped together 
and then further examined. 

Referring to FIG. 8, each resulting component (that 
survives the color detecting, edge removal, and component 
classification steps) is represented by a vertex of a graph. 15 
Vertices are connected if they are close in space in the 
original image and if they have a similar color in the original 
image. Two components, i and j, have a similar color if: 

|Y,~Y,]<t/|U r U,K%rVK 

20 

where Y„, U„, and V M , are the average values of the 
luminance and chrominance of the 11 th component and t„ are 
threshold values. The thresholds are based upon variations in 
the Y, U, and V values in faces and are kept high enough so 
that components of the same face will be considered similar. 25 
Components are considered close in space if the distance 
between them is less than a threshold. The spatial require- 
ment ensures that spatially distant components are not 
grouped together because portions of a face would not 
normally be located in spatially distant portions of an image. 30 

The connection between vertices is called an edge. Each 
edge is given a weight which is proportional to the Euclidean 
distance between the two vertices. Connecting the vertices 
together will result in a graph or a set of disjointed graphs. 
For each of the resulting graphs, the minimum spanning tree 35 
is extracted. The minimum spanning tree is generally 
defined as the subset of a graph where all of the vertices are 
still connected and the sum of the lengths of the edges of the 
graph is as small as possible (minimum weight). The com- 
ponents corresponding to each resulting graph are classified 40 
as either face or not face using the shape parameters defined 
in the component classification step mentioned above except 
that now all the components in a graph are classified as a 
whole instead of one component at a time. Then each graph 
is split into two graphs by removing the weakest edge (the 45 
edge with the greatest weight) and the corresponding com- 
ponents of the resulting graphs are examined again. The 
division continues until the area of a bounding box formed 
around the resultant graphs is smaller than a threshold. 

By breaking down and examining each graph for a face, 50 
a set of all the possible locations and sizes of faces in an 
image is determined. This set may contain a large number of 
false positives and so a heuristic is applied to remove some 
of the false positives. Looking for all the facial features (i.e. 
nose, mouth, etc.) would require a template which would 55 
yield too large of a search space. However, experimentation 
has shown that those facial features have edges with a high 
variance. Many false positives can be removed by examin- 
ing the ratio of high variance pixels inside a potential face, 
to the overall number of pixels in the potential face. 60 

The aforementioned heuristic is effectuated by first apply- 
ing a morphological closing operation to the facial candi- 
dates within the image. As is known in the art, a mask is 
chosen and applied to each pixel within a potential facial 
area. For example, a 3x3 mask could be used. A dilation 65 
algorithm is applied to expand the borders of face candidate 
components. Then an erosion algorithm is used to eliminate 
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pixels from the borders. One with ordinary skill in the art 
will appreciate that these two algorithms, performed in this 
order, will fill in gaps between components and will also 
keep the components at substantially the same scale. Clearly, 
one could perform multiple dilation and then multiple ero- 
sion steps as long as the both are applied an equal number 
of times. 

Now, the ratio of pixels with a high variance neighbor- 
hood inside the face candidate area is compared to the total 
number of pixels in the face candidate area. Referring to 
FIGS. 9A to 9D, an original image in FIG. 9 A is examined 
for potential face candidates using the methods described 
above to achieve the binary image shown in FIG. 9B. The 
morphological closing operation is performed on the binary 
image resulting in the image shown in FIG. 9C. Finally, 
pixels with high variance located in the image of FIG. 9C are 
detected as is shown in FIG. 9D. The ratio of the high 
variance pixels to the total number of pixels can then be 
determined. 

As can be discerned, the invention, through detecting 
pixels that are skin colored, removing edges, grouping 
components, classifying components, and applying a 
heuristic, thereby detects faces disposed within a digital 
image. The method can be summarized by steps S2-S16 
shown in FIG. 10. The data for performing the steps can be 
stored on a computer readable storage medium such as a 
CD-rom or a floppy disk. 

Having described the preferred embodiments it should be 
made apparent that various changes could be made without 
departing from the scope and spirit of the invention which is 
defined more clearly in the appended claims. 

What is claimed is: 

1. A method for detecting a face disposed within a digital 
image, comprising the steps of: 

providing a digital image composed of a plurality of 
pixels; 

providing a binary image from the digital image by 

detecting skin colored pixels; 
removing pixels corresponding to edges in the luminance 

component of said binary image thereby producing 

binary image components; 
mapping said binary image components into at least one 

graph; and 

classifying said mapped binary image components as 
facial and non-facial types wherein the facial types 
serve as facial candidates, 
further comprising the step of applying a heuristic, said 
heuristic including in the following steps: 
applying a morphological closing operation on each of 

said facial candidates to produce at least one closed 

facial candidate; 
determining high variance pixels in said closed facial 

candidate; 

determining the ratio between said high variance pixels 
and the total number of pixels in said closed facial 
candidate; and 

comparing said ratio to a threshold. 

2. A method for detecting a face disposed within a digital 
image, comprising the steps of: 

providing a digital image composed of a plurality of 
pixels; 

providing a binary image from the digital image by 

detecting skin colored pixels; 
removing pixels corresponding to edges in the luminance 

component of said binary image thereby producing 

binary image components; 
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mapping said binary image components into at least one 
graph; and 

classifying said mapped binary image components as 
facial and non-facial types wherein the facial types 
serve as facial candidates 5 
wherein said step of removing includes: 

applying a mask to a plurality of said pixels including 

an examined pixel; 
determining the variance between said examined pixel 
and pixels disposed within said mask; and 
comparing said variance to a variance threshold wherein: 
said step of removing is repeated for decreasing vari- 
ance thresholds until a size or said binary image 
components is below a component size threshold; 
and 

after each step of removing, each of said binary image 
components is classified as one of the facial type and 
non-facial type. 

3. A method for detecting a face disposed within a digital 2Q 
image, comprising the steps of: 

providing a digital image composed of a plurality of 
pixels; 

providing a binary image from the digital image by 

detecting skin colored pixels; 25 
removing pixels corresponding to edges in the luminance 

component of said binary image thereby producing 

binary image components; 
mapping said binary image components into at least one 

graph; and 

classifying said mapped binary image components as 
facial and non-facial types wherein the facial types 
serve as facial candidates wherein said step of remov- 
ing includes: 35 
applying a mask to a plurality of said pixels including 

an examined pixel; 
determining the variance between said examined pixel 
and pixels disposed within said mask; and 
comparing said variance to a variance threshold wherein: 4 q 
said step of removing is repeated for decreasing vari- 
ance thresholds until a size of said binary image 
components is below a component size threshold; 
and 

after each step of removing, each of said binary image 45 
components is classified as one of the facial type and 
non-facial type, 

wherein said binary image components are connected. 

4. A method for detecting a face disposed within a digital 
image, comprising the steps of: 50 

providing a digital image composed of a plurality of 
pixels; 



30 



providing a binary image from the digital image by 

detecting skin colored pixels; 
removing pixels corresponding to edges in the luminance 

component of said binary image thereby producing 

binary image components; 
mapping said binary image components into at least one 

graph; and 

classifying said mapped binary image components as 
facial and non-facial types wherein the facial types 
serve as facial candidates, 

wherein said step of mapping comprises the following 
steps: 

representing each component as a vertex; 
connecting vertices with an edge when close in space 

and similar in color, thereby forming said at least one 

graph, 

5. A method for detecting a face disposed within a digital 
image, comprising the steps of: 

providing a digital image composed of a plurality of 
pixels; 

providing a binary image from the digital image by 

detecting skin colored pixels; 
removing pixels corresponding to edges in the luminance 

component of said binary image thereby producing 

binary image components; 
mapping said binary image components into at least one 

graph; and 

classifying said mapped binary image components as 
facial and non-facial types wherein the facial types 
serve as facial candidates, 

wherein said step of mapping comprises the following 
steps: 

representing each component as a vertex; 
connecting vertices with an edge when close in space 

and similar in color, thereby forming said at least one 

graph, 

wherein each edge has an associated weight and further 
comprising the steps of: 

extracting the minimum spanning tree of each graph; 

classifying the corresponding binary image compo- 
nents of each graph as one of the facial type and 
non-facial type; 

removing the edge in each graph with the greatest 
weight thereby forming two smaller graphs; and 

repeating said step of classifying the corresponding 
binary image components for each of said smaller 
graphs until a bounding box around said smaller 
graphs is smaller than a graph threshold. 
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